A CONCEPTUAL STUDY OF 


AUTOMATIC AND SEMI-AUTOMATIC 
QUALITY ASSURANCE TECHNIQUES 
FOR GROUND IMAGE PROCESSING 


Prepared for: 

National Aeronautics and Space Administration 
Goddard Space Flight Center 
Greenbelt, Maryland 


Under Contract Number 
NAS 5-27513 


by; 

Engineering and Economics Research, Inc. 
1951 Kldwell Drive 
Vienna, Virginia 


September 30, 1983 



FOREWORD 


^is report summarizes the results of a study conducted by Engineering 
and Economics Research (EER), Inc. under NASA Contract Number NAS 5-27513. 

The study involved the development of preliminary concepts for automatic 
and semi-automatic quality assurance (QA) techniques for ground image proces- 
sing. EER was supported by MRJ Incorporated, as subcontractor, in this 
study and we acknowledge the valuable contributions of Dr. Edward McMahon 
in this effort. 

EER acknowledges the valucible assistance of Mr. Joseph Heinig of NASA/ 
GSFC, the contract technical officer. In addition, helpful comments, con- 
structive criticism, and useful guidance provided by Mr. Paul Heffner, Mr, 
Fred McCaleb, emd Mr. Gerald Grebowsky of NASA/GSFC during the coxirse of 
this study were very useful. 



TABLE OF CONTENTS 


PAGE 

LIST OF FIGURES v 

LIST OF TABLES vi 

1.0 SCOPE 1-1 

1.1 Summary........ 1-1 

2.0 DEFINITIONS 2-1 

2.1 Quality Assessment and Quality Assurance.... 2-1 

2.2 Automated QA 2-2 

2.3 Semi -Automated QA 2-3 

2.4 QA of System Components 2-4 

2.4.1 QA of Hardware Components 2-4 

2.4.2 QA of Software Components 2-4 

2.5 QA of the Process 2-6 

2.6 Measures of Quality Assessment 2-7 

3.0 GENERIC SYSTEM 3-1 

3.1 System Description 3-1 

3.2 Sources of Error 3-3 

3.2.1 Data Source 3-3 

3.2.2 Transmission Links 3-4 

3.2.3 Initial Processing 3-4 

3.2.3. 1 Calibration 3-5 

3. 2. 3. 2 Destrlplng 3-5 

3. 2. 3. 3 Geometric and Radiometric Correction; 

Registration and Reformatting 3-6 

3. 2. 3. 4 Annotation 3-6 

3.2.4 Storage and Retrieval System. 3-7 

3.2.4. 1 Storage System 3-7 

3. 2. 4. 2 Reproduction 3-7 

3. 2. 4. 3 Special Processing 3-8 

3. 2. 4. 4 Request processing 3-8 

3.3 Qtiallty Assurance Systems 3-8 

3.3.1 QA Algorithms 3-9 

3.3.2 Sensing Paramaters of QA 3-10 

3.3.3 Failure and Warning Reports. 3-11 

3.3.4 Example QA Process 3-11 

3.4 Data Types 3-13 

3.4.1 Sensor Data 3-14 

3.4.2 Test Data 3-15 

3.4.3 Calibration Data 3-16 

3.4.4 Support Data 3-17 

4.0 AUTOMATED QUALITY ASSURANCE 4-1 


111 



TABLE OF CONTENTS (Concluded) 


PAGE 

4.1 Data Access 4-1 

4.2 MIS for Reporting and Tracking 4-2 

4 . 3 Pertinent Functions of AQA 4-7 

4.3.1 Known Functions 4-7 

4. 3. 1.1 Calibration.......... 4-7 

4. 3. 1.2 BER 4-8 

4. 3. 1.3 Destrlplng 4-8 

4.3.2 Adaptive Functions 4-9 

4.3.2. 1 Pattern Classification Techniques 4-14 

4. 3. 2. 2 Learning In Linear Classifier 4-20 

4. 3. 2. 3 Statistical Decision Techniques 4-21 

4. 3. 2. 4 Sequential Decision Techniques 4-23 

4. 3. 2. 5 Learning In Sequential Pattern 

Recognition Systems 4-29 

4. 3. 2. 6 Summary 4-31 

5.0 USE OF QUALITY ASSESSMENT IN QUALITY ASSURANCE 5-1 

5.1 Sampling as a Tool 5-1 

5.1.1 Adaptive Sampling 5-7 

5.2 Other Parameters 5-7 

5.2.1 Costs 5-7 

5.2.2 Other Measures 5-8 

6.0 SURVEY OF STATE-OF-ART 6-1 

6 . 1 Literature Survey 6-1 

7.0 CONCLUSIONS 7-1 

7.1 Recommendations 7-1 

8.0 BIBLIOGRAPHY 8-1 


Iv 



LIST OF FIGURES 


FIGURE 1.1 GENERAL DATA FLOW PROCESS WITH CENTRAL QA FUNCTION 1-5 

FIGURE 3.1 GENERIC DATA FLOW PROCESS 3-2 

FIGURE 3.3.1 EXAMPLE QA PROCESS IN A GENERIC SYSTEM FRAGMENT 3-12 

FIGURE 3.4.1 EXAMPLE OF ADAPTIVE THRESHOLD MONITORING 3-19 

FIGURE 4.2.1 A MIS FOR GENERIC IMAGE PROCESSING SYSTEM 4-4 

FIGURE 4.3.1 A PATTERN RECOGNITION SYSTEM 4-11 

FIGURE 4.3.2 AI APPROACH TO QA 4-15 

FIGURE 4.3.3 A PATTERN CLASSIFIER 4-17 

FIGURE 4.3.4 A DECISION BOUNDARY IN A 2-DIMENSIONAL FEATURE SPACE... 4-18 

FIGURE 4.3.5 A DISTANCE CLASSIFIER 4-24 

FIGURE 5.1 COST TRADE-OFF MODEL 5-9 


V 



LIST OF TABLES 


Page 

TABLE 1 PROBABILITY OF X ERRORS IN SAMPLE GIVEN N ERRORS 

IN POPULATION 5-5 

TABLE 2 PROBABILITY OF N ERRORS IN POPULATION GIVEN X 

ERRORS IN SAMPLE (X = 0,1,2) 5-6 



1.0 SCOPE 


This report addresses automation of quality assurance and quality 
assessment techniques In a scientific sensor data processing system. A 
distinction Is made between quality assessment and the more comprehensive 
quality assurance which Includes decision making and system feedback control 
In response to quality assessment. 

The philosophy of automated quality assessment (QA) Is the main subject 
of this study. Some examples of automated QA are given, but they are specu- 
lative. It is difficult to give attractive and feasible techniques without 
concentrating on a specific system design; however, the principles espoused 
herein as philosophies need to be recognized at the beginning of the design 
of future systems which attempt to incorporate automated QA. 

1 . 1 Summary 

An automated QA system should be: 

o Designed integrally with the processing system. Access of the 
QA function to data, knowledge of input data quality, and having 
test sequences available are requirements for the QA function design. 

o Easily managed. Trends, failures, and status of changes to the 
system to improve quality or correct failures must be logged and 
tracked through a management Information system. 


1-1 



• Maintained and modified, the QA function should accept new algorithms 


and modifications as quality problems emerge from a maturing system. 
Having access to all data paths and a library of simple measures 
will allow "artificial intelligence” concepts such as learning 
to be studied. 

There are obviously different levels of QA and different amounts of 
automation \diich may be used to implement QA. 

The minimum level of QA is output product inspection. With this level, 
little more can be done than to reject bad products, request regeneration, 
and report the error to some maintenance or repair function. Failures 
of a production line are detected by operators or by lack of output and 
are handled the same way. Higher levels of QA are inspection of intermediate 
products, trend analyses on product quality parameters, and running and 
analyzing test data. As with failure reports, warning reports are generated 
for the maintencmce and repair function. 

Automating these QA functions requires; 

1 . Sensing the parameters used by a QA analyst. 

2 . Emulating the algorithm used by the QA analyst to reach a decision. 

3. Generating failure or warning reports; tracking progress and 
closeout of outstanding reports. 

Steps 1 and 3 can be accomplished easily within the state-of-the-art. 


1-2 



step 3 IS a management information system which is not difficult to 
implement (or piirchase and modify) . 

Step 1 requires that the QA function have access to the pertinent 
data. This may be difficult in certain cases, but should be feasible if 
considered in the initial design of a system. 

The major difference between a human analyst QA function and an automated 
function is in Step 2 . A human analyst is adaptive. 

An improvement can be made to some existing QA functions simply by 
providing automated Steps 1 and 3, but it is also possible to program known, 
planned QA functions such as checks on data bounds, format, data counts, 
trends, statistics (destriping) , etc. In addition, as new QA evaluation 
functions are discovered ( as a result of, say, failures or degradation 
of some system component), they may be added to a repertoire of QA algorithms. 

It is difficult to program a general adaptive process. One approach 
to an adaptive process is a general learning program, which is "taught" 
or trained by an e;q>erienced analyst as to what is "good" and what is "bad" 
output. Such a program would have to maintain many statistics on each 
data flow. Candidate statistics could be max, min, mean, mode, second 
and third central moments, correlation between data items, and autocorrelation 
function over a fixed length sample. 

Under training of a "bad" output, a learning program would correlate 
differences in statistics between good and bad data to the definition of 
"bad." It appears more cost effective at this time to plan for assisted 


1-3 



automated QA rather than attempting (and having to rely on) adaptxve fully 
automated QA functions. 

Figure 1-l presents the concept of an integrated quality assessment 
system in a generic data flow process. %e pertinent features are that 
the quality assessment function is not done piecemeal throughout the system. 
It has access to all data, and it has input to the process control function 
to encdsle quality assurance. 

The presentation of the QA function as a single function does not 
preclude implementation of various subfunctions in distributed processors, 
perhaps coresident in the processors implementing various main-line functions 
in the system, but any QA processes must communicate to each other or to 
a supervisory QA function and a QA management information system. 

Access to the data is obviously necessary, and the processing system 
must be designed with this access and overhead considered. The access 
does not have to be consteuit, and should be under the control of the QA 
process. The amount of data needed to be processed by the qa function 
will vary depending on system status euid history. (For example, a low 
sample rate would be appropriate when the system has a history of nominal 
behavior) . Therefore, the ability to supply data should be designed in 
the processing system; the ability to select data for assessment should 
be built into the QA function. 

The most fruitful area for automated QA is in trends analysis on data 
from many points in the data flow. Catastrophic failures are simple to 


1-4 



data TRANSMISSION INITIAL STORAGE ^ ^ OUTPUT 

SOURCE KH LINKS hTH PROCESSING ftn PH ACCEPTANCE 



1-5 


FIGURE 1-1. GENERIC DATA FLOW PROCESS WITH CENTRAL QA FUNCTION 








detect, and random unimportant bit errors can be tolerated ( and are expensive 
to detect) . Day-to-day QA requires algorithms to detect vdien quality measures 
cross some threshold between acceptcible and unacceptable. 

When quality measures are computed regularly, it is easy to perform 
trend analyses, and to anticipate or predict when some quality measure 
may be approaching a threshold of unacceptability. Appropriate action 
may then be taken to correct the cause of the trend. 

The algorithm used to compute quality measures may be based on knowledge 
of the process and quality measures, or may have to be general statistical 
computations from vAiich an adaptive function can "learn" differences in 
acceptable and unacceptcible products. 

Knowledge-based algorithms require more design and analysis, and less 
computing time than adaptive functions. Knowledge-based algorithms reliably 
give answers according to the rules specified (which may be incomplete 
or incorrect); adaptive functions can be "trained" to follow changing standards 
of quality (at some lag in time), but give the desired answer with some 
probability less than one. Crop recognition algorithms are good examples 
of adaptive algorithms. 

The concept of automated quality acceptance and assurance presented 
here is treated as a system-level function. The interfaces between QA 
and the Data Processing Functions are complex, and QA would benefit by 
being designed in conjunction with the Data Processing Functions. However, 
the utility of an automated QA system depends on the use made from its 


1-6 


output by the Production Control Function, so there is cause to include 
QA in Production Control. The treatment of the QA process ultimately depends 
on the philosophy of the system sponsor, and, in any disposition, relies 
on system engineering to properly integrate and trade QA system costs and 
benefits. 


1-7 



2.0 DEFINITIONS 


2.1 Qiiallty Assessment and Quality Assurance 

The term Quality Assurance is generally accepted to mean the program 
within a system which ensures that the qxiallty of the product meets minimum 
standards. Implied In this term Is the measurement of quality of the pro- 
duct, the decision to act or not to change the system, and a means of chang- 
ing the system to adjust the quality of the product. These three components 
of Quality Assurance will be referred to as: 

1 . Quality Assessment 

2. Production Management Responses 

3. Corrective Action 

Quality Assessment Is the kemal of the problem being studied. It 
can be a manpower Intensive effort If capabilities to automatically monitor 
product quality are not in place. 

Production Management Responses to a measure of product quality are 
usually constrained by policy, and production management Is expected to 
be able to recognize or discover what faults cause unacceptable quality 
and what adjustments, corrections, or repairs are necessary to restore qual- 
ity output. Production management may have the option of deciding to ship 
or reprocess marginal quality products based on cost, time to reprocess, 
and knowledge of customer requirements. 


2-1 



The corrective actxons which may be taken to adjust the quality of 
the output are system specific. Such actions as repair of failed components, 
realignment and recalibration, cleaning, and special processing are included 
in this class of actions in addition to actions such as control parameter 
adjustments. Software errors and operator mistakes require corrective ac- 
tion. 


Corrective action is generally not under control of Quality Assurance 
even though Quality Assurance requests corrective actions. Corrective Ac- 
tions may require an engineering change request, software modification with 
configuration control approval, repair, and maintenance action, procedural 
change, or personnel training, other (usual) corrective action requires 
simple parameter adjustment. This is considered to be feedback control 
and must be included in the specific system implementation. Quality assur- 
ance includes feedback control; quality assessment does not. 

This study will concentrate on automation of quality assessment and 
will use the acronym QA to refer to quality assessment. 

2 .2 Automated QA 

To distinguish between automated and semi-automated QA, automated QA 
is defined to encon^ss all QA done in the system without operator interven- 
tion. Thus, the assessment of quality determined by an algorithm leads 
to product acceptance and rejection, reprocessing orders, and repair orders. 
This is a reasonable goal but the state-of-the-art in artificial intelligence 
will not now reasonably support this goal for a new system. TSie problem 


2-2 



lies in the probability that a new system will ejqperience unexpected errors 
and the difficulty in generating a complete set of product acceptance crite- 
ria. The need to automatically control and correct the process for unantici 
pated errors requires an adaptable, learning algorithm. But for a stable, 
well understood process, a reasoncdsly complete set of product acceptance 
bounds can be established and simple production management decision rules 
can be programmed, closing the loop to the process controls and extending 
automated quality assessment to automated quality assurance. 

An assumption in the above paragraph is acceptance of failures when 
unexpected errors occur (but the likelihood of stich errors is small because 
the system is stable and well understood) . Thus, some superivision is nee- 
ded. Extending the concept of supervision leads to the definition of semi- 
automated QA. 

e 

2 .3 Semi-Automated QA 

Semi -Automated QA is defined to be operator assisted automated computa- 
tion of quality measures rather than a direct measure of quality. Uiis 
is certainly part of automated QA and may included estimates of quality. 

Hie intent in this definition is to emphasize the participation of an opera- 
tor or production control personnel in the process of QA as well as in pro- 
duction management, and to allow the easy addition of new assessment algo- 
rithms. A semi -automated QA system has more flexibility to adapt to a chang 
ing or maturing system and more ability to tolerate a middle ground of QA 
- where estimates of quality meet neither criteria for acceptance nor for 
rejection, and judgement is required. 


2-3 



2 .4 QA of System Components 


The focus of a QA system is the output product, and it is appealing 
to consider a QA scheme that only meas\ires observables in the output product 
to judge whether or not an output product is acceptable. This is too narrow, 
and presumes a completeness of knowledge of output product observables tdiich 
is probably not the case. Moreover, given a quality failure, no indication 
of the cause of the error is known. Therefore, QA of system components 
and intermediate products should be performed. 

2.4.1 QA of Hardware Components 

It is presumed that all hardware in a system has been accepted by tests 
against a specification, and QA of hardware components may be performed 
by exercising certain of the acceptance tests, modified as appropriate to 
work in a production environment. QA of hardware components (in addition 
to obvious failure determination) will support trend analyses of parameters 
such as processing time, operator interactions, feedback control pareuneter 
values, power, and e:q>endable usage (if appropriate). Study of the trends 
or correlations of trends with failures may lead to prediction capabilities 
to avoid failures. 

2.4.2 QA of Software Components 

Software does not degrade over time as does hardware. There are no 
situations vdiere software works properly at one time and fails later, but 
there are various conditions where it may appear that the software is acting 


2-4 



in a random fashion (in error) . Typical cases occur when input data or 
parameters are outside e:q)ected bounds and no limit checking is performed. 
Another condition may be produced by combinations or sequences of data values 
or events not anticipated in specifying, designing, or building the soft- 
ware. Still, the software is deterministic and will give repeatcQ>le re- 
sults. There are hardware faults that may make it appear as if the soft- 
ware failed (such as a bit error, memory failure, or I/O error). To protect 
the software itself against hcurdware errors, it is prudent to have copies 
of software and to periodically verify — not just recopy — the operational 
software against an archived copy to detect any errors in the operational 
code. 


Quality failures in software, "bugs", are hitherto undiscovered errors. 
These errors exist due to failures in specifying the software or inadequate 
acceptance testing the software. (There is an argument that inadequate 
testing may be more cost effective than complete testing if the QA and main- 
tenance processes are inexpensive, or schedule is a driving function.) 

Both types of failures can occur on the initial build of the software or 
can result from changes made to the software (fixing one bug may uncover 
others, or the fix might ignore critical interactions else^ere in the pro- 
cess) . It is obviously important to have adequate acceptance testing of 
changes to software. 

Failures in specifying the software can result from the specifications 
being incorrect or incomplete. Incompleteness may result from an unforeseen 
combination of events or from presuming (or needing) the software to compen- 
sate for failures or tolerances in other components. It may be argued that 


2-5 



the latter cases are not specification failures, and no specification can 
afford to be complete regarding every possible failure mode and combination 
of conditions but, in any case, a change to the specification must be made 
to resolve these issues if they occur. 

At the output product level it may not be possible to distinguish be- 
tween errors caused by hardware or those caused by software errors. 

2 .5 QA of the Process 

The quality of the output product, given the design of the processing 
system, is a function of the hardware cuid software components discussed 
above, the data and the control parameters. The data will be discussed 
in detail in later sections, but it is necessary here to state that the 
quality of the input data must be known if proper quality assessment and 
assurance is to be performed. Input data quality must be known to assess 
quality failures, to order reprocessing or to accept poor output as the 
best possible, ot to adjust control parameters to optimize processing. 

The data contains calibration or reference data which is used to compute 
control parameters. Usually, calibration data is designed to be usable 
even at poor signal to noise ratios or high bit error rates (e.g., step 
functions with many samples per step, linear ramp functions, etc.). Separate 
checking should be perforaed on calibration data as recieved to assess the 
consistency and reliability of the calibration source. 

other control parameters are set by the operator or by calculations 


2-6 



performed on the data (including feedback from measures of output data). 

The proper use of data — sensors, telemetry and calibration data — depends 
on the hardware and software, leaving errors by the operator to be discussed 

All control pareuneters set by the operator should be recorded, not 
to assess blame, although such records will locate causes of quality errors, 
but to be analyzed to determine if changes in procedures or training are 
necessary. 

Analysis of operator controls will also reveal which controls could 
be easily automated. 

2 .6 Measures of Quality Assessment 

The fundamental measure of quality is whether or not the output product 
meets specifications. The fundcunental measure of QA is, therefore, what 
fraction of product sent to users meets specification, and what is the cost 
of performing the QA. 

Cost can be measured in various ways, such as system availability, 
throughput, average time in process, euid overhead (poor quality product, 
test, maintenance, calibration) . The appropriate measure depends on the 
mission of the processing system. A system designed for rapid response 
runs to a different criteria than a system designed for bulk throughput, 
or one designed for custom processing. 


2-7 



User demands and syst«n guarantees set the percentage of output product 


which meets specifications. 


2-8 



3.0 GENERIC SYSTEM 


3.1 System Description 

Figure 3-1 presents a block diagram for a general sensor processing 
system. It Is principally a serial processing system for some set of stan- 
dard products. These standard products are generated according to a specifi- 
cation on some schedule based on receipt of data. In addition to the stan- 
dard product t an archival record of the data Is kept and Is used as a source 
to fill custom requests for special processed data. 

The data source in the block diagram represents the sensor and all 
processing to format the data, and Includes calibration and telemetry data. 
Transmission links Include all processes up to the receipt of the data 
(sensor, calibration and telemetry) at the processing facility. It is at 
this point, the input to the Initial processing, that a measure of the qual- 
ity of the data should be made. Some Indication of quality may be available 
from the transmission links subsystem to augment or Identify sources of 
any error, but a quality measure of the input data is necessary as a refer- 
ence for later error Identification. 

The Initial processing block in the diagram contains all standard pro- 
duct processing. The result of initial processing is an archival copy of 
the data, which may also be the standard output product. 

Typical initial processing which would be Included in an Image process- 
ing system would Include calibration, reformatting, geometric and radiometric 


3-1 










correction, and annotation or association of ancillary data with sensor 
data. A more complex process mi^t also include registration to some stan- 
dard pro:)ection, and destriping and missing data estimation. Mote that 
regiatration (which involves resampling), destriping and missing data estima- 
tion (say, for failed detectors or scan line length variations) is a form 
of error correction or quality assiirance. if these functions are included 
as image processing functions, then they are not considered quality assurance 
processes as addressed in this report. 

Special or custom processing requires identification of the necessary 
processing, and retrieval, processing euid reproduction of the data. Mote 
that this chain includes the processing of the request as well as the proces- 
sing of the data. 

3 .2 Sources of Error 


3 .2 . 1 Data Source 

All sensors operate in an environment which produces a signal-to-noise 
ratio (SMR) \diich is a measure of sensor data quality. Estimates of the 
SMR at the input processor may be made by a QA process not only to know 
the input data quality but to be used in trend analysis of the sensor per- 
formance. SMR estimates may also control later processing and may be used 
as feedback control to commeind sensor system parameters (such as gains, 
filters, data con^ression, etc.). 

Since most data appears random at first glance, and since the data 


3-3 



passed through a transmission system, estimating the SNR may be difficult. 
One means is to compare the power spectrum of the input data to elected 
power spectra. Transmission system performance can be estimated by analysis 
of calibration and other fixed format data (sync patterns, fill data, etc.). 

Failures in sensor channels should be recognizable from trend analy- 
ses. ' However, conceiving an automated response to every imagined sensor 
systan failure does not appear cost-effective. Automated measure of trends 
and alerting abrupt changes or threshold crossings certainly is feasible 
and recommended. 

3.2.2 Transmission Links 


Most transmission links provide error detection and (at least some) 
error correction. These give measures of system performance. As mentioned 
above, calibration data and fixed format data can be used to estimate BERs 
and data drop-outs, but these errors may not be attributable to the transmis- 
sion links, as errors could arise in the sensor system. Schemes such as 
retransmission of data from a remote receiving site or transmission of test 
data can resolve some of these issues. 

3.2.3 Initial Processing 

This process contains the main processes in the system. The output 
is the archival data and usually the "standard" product. The processes 
mentioned in 3.1 above will be addressed in turn. 


3-4 



3 .2 . 3 . 1 Ca libration 


The calibration data must be identified and associated with the correct 
sensor data, the quality of the calibration data should be assessed and 
trend analysis performed. Sudden changes in calibration data is cause to 
suspect the process or the sensor. 

Post-calibration data analysis (averages and variances) should be suf- 
ficient to assess the quality of the calibration process. Low frequency 
sampling appears capcJsle of maintaining QA on the process if trend analysis 
IS done on calibration data. 

3. 2. 3. 2 De Striping 

The need for destriping indicates incorrect or insufficient calibra- 
tion. It is, in fact, a means of calibration. 

If destriping is done (in lieu of calibration per se) trends should 
be maintained on the correction needed in each channel as if it were calibra- 
tion data. Sudden changes or threshold crossings are cause for notification 
of system error. 

QA of the destriping process is probably best done with test data. 

An alternate is a second pass of destriped data vAiich should suffer no 
change. This, however, is implementation algorithm dependent. 


3-5 



3.2 .3.3 Geometric and Radiometric Correction; Registration and Reformatting 


O^ese processes are variations of resampling of data. (Radiometric 
correction may be considered calibration or destriping.) There are two 
major aspects to resampling: calculating the resampling parameters by analyz- 

ing the data and performing resampling. 

Calculation of resampling parameters may be done from ancillary data 
(telemetry, stored parameters from independent analyses, required output 
formats) in an open- loop fashion or from analysis of data content (ground 
control points, other correlation) . These calculations can be checked by 
“reasonableness" checks and by sampled analysis of the processed data. 

For example, ground control point residuals can be checked; correlation 
of registered data with the registration base can be made and correlation 
values analyzed. 

The actual resampling process can be monitored by processing test data 
and comparing output to analytically derived results. Example test data 
are constant data, linearly varying data, and fixed frequency data. 

3 .2 .3.4 Annotation 


Computation of annotation from telemetry or registration processes 
IS difficult to verify. Trend analyses or sampled checking by an analyst 
may be the only reasonable choices. Indications of annotation error may 
be found by other processes - especially those in 3.2 .3.3 that depend on 
the content of data also used for annotation. 


3-6 




3.2.4 Storage and Retrieval System 


3.2 .4.1 Storage System 

Typical errors in a storage system are data deterioration and lost 
data (misplaced, mislcd>eled or not in storage). The degree of data deteriora- 
tion may be estimated from existing studies or by including fixed test data 
in the stored data. Rediindant coding and storage techniques could be used 
if this is seen as a problan. 

Lost data is usually an operator problem and should be minimized by 
a good library management system. Data identification should be well distri- 
buted in the data to prevent misidentif ication. 

3 .2 .4 .2 Reproduction 

Errors in archival masters are copied to duplication masters. These 
in turn are copied to production data, euid bit errors are added at each 
duplication step. Sepcific system constraints will dictate the reproduction 
process. QA measures of errors requires measures of errors in the master 
and independent measures of errors in the product. Test strips and fixed 
format data are useful in monitoring this process, but it is important that 
the test data be such that it will be treated identically to actual sensor 
data. 


QA of the reproduction process could be the most valuable QA in the 
system, and should be relatively easy to automate. 


3-7 



3 .2 . 4 . 3 Special Processing 


Special processing is system dependent. If a special process is offered 
as a standard service to be requested, it should be treated as the initial 
processes for QA, except cost-effectiveness values of automated QA are dif- 
ferent from those in initial processing. If a special process is a cus- 
tomized assemblage of available processes or algorithms, then the QA will 
probably be an individual assessment perforce. 

3.2 .4.4 Request Processing 

Proper record keeping and a good data management system should hold 
errors in requests, custom processing parameters, and data identification 
to a minimum. Independent checks against order copies and verification 
or orders to customers are means to identify errors in request processing. 

QA of this process would probably be a manual process, and complete records 
of such errors should be kept to determine if changes in procedures are 
needed . 

3.3 Quality Assessment System 

Automating a quality assessment function requires: 

1 . Sensing the parameter used by a QA analyst. 

2. Emulating the algorithms used by the QA analyst to reach a decision. 

3. Generating failure or warning reports; tracking progress and close- 
out of outstanding reports. 


3-8 



The most difficult step, QA algorithms, will be addressed first. 

3.3.1 QA Algorithms 

Algorithms for QA can be classified as those required to assess a known 
quality parameter (such as striping between detectors or deviation of cali- 
bration data from nominal values) and those which adapt to evaluate quality 
measiures not anticipated in the design phase of the system. Both classes 
of algorithms are covered in some detail in Section 4, and both classes 
of algorithms fit under the broad umbrella of "Artificial Intelligence" 

(AI) in common usage. AI successes at present are in knowledge based systems 
(or expert systems) in which a body of knowledge is codified as a number 
of IF-THEN statanents in a fixed hierarchical structure. An example of 
such a constrixrt in QA could be: 

IF (calibration bias - normal is less than e) THEN (set variable A 
to TRUE) ELSE (set variable A to FALSE) 

IF (estimated bit error rate is between TABLE (1,1) and TABLE (1,2) 

THEN (set variable B to TABLE (1,0). 


IF (variable A is TRUE and variedale B is greater than x) THEN (issue 
quality warning flag 1 ) . 


3-9 



This example makes the point that the QA algorithms, called A1 or not, must 
be designed for a specific system. 7)ie knowledge coded into the algorithm 
IS quite specific. 

Adaptive algorithms produce values for clauses or terms in the above 
IF-THEN statements. For instance, a system might "learn" in a training 
mode what values are appropriate as bounds in a table as in the second IF- 
THEN statement in the above example. Such a system will have mistakes in 
QA analysis unless the bounds are very stcdsle, and until the bounds are 
"learned." 

As a system matures, additional tests and algorithms can be implemented 
in the QA process. Oliese would reflect additional knowledge gained about 
the operation of the system. 

3.3.2 Sensing parameters for QA 

It is generally impossible to state completely which parameters or 
vdiat data is needed for QA by the principle that known sources of error 
can be avoided; the unanticipated sources of error will cause problems . 

For that reason, we state that access to all data should be provided for 
the QA process during system design, and the QA process has the choice of 
vdiether to read a specific data element or not. While it is easy to make 
such a statement, it is not easy to justify excessive costs to mcdte high 
volume or high rate data continuously availcUsle for QA. As in any system 
design, trades must be made between cost and amount of data access, but 
QA access to data must be taken into account during such trades. This is 
f Tirther addressed in Section 5 . 


3-10 



It IS presumed that parameters observable by a QA analyst can be compu- 
ted from available data and control parameters. iSiis may not be simple,, 
and can be viewed as "feature extraction" (also addressed in Section 4) . For 
instance, "zipper" in an image produced from digitial data vdien bit synchroni- 
zation is lost for part of the line is obvious to an operator but is not 
trivial to find by confutation. 

It is strongly recommended that access to data and control parameters 
throughout the system be included in the system design. 

3.3.3 Failure and Warning Reports 

In addiition to the obvious process management function supported by 
the issuance and trackings of QA failure reports, the QA process should 
use history of failure to condition data processing until the fault is cor- 
rected. Knowledge of faults in a process should be used to modify processing 
schedules and maintenance. Knowledge of faults in data should be used to 
request reprocessing, or retransmission, or to issue disclaimers on output 
quality from flawed source data. 

3.3.4 Example QA Process 

Figure 3 .3-1 presents an example implementation of QA process in a 
scientific data processing system. Only part of the system is shown; addi- 
tional stages of processing may be to the left and right of the portion 
shown. 


3-1 1 




PROCESS CONTROL 



FIGURE 3.3-1. Example QA Process in a Generic System Fragment 











The QA process consists of a nvonber of local processes and a QA super- 
visor process, shown implemented with a management information system. 

The QA processing shown as being separate might be implemented in one central 
processor or the local processes mi^t be integrated with their associated 
data processing functions. The diagram emphasizes that local processes 
(which may implement the same or similar algorithms) are applied to various 
data paths emphasizes that the quality measures computed by the local QA 
processes are fed forward for use in coiq>uting later data quality and for 
use in assessing performance of the data processing functions, and the qual- 
ity measures are fed to the central or supervisory QA function. Ihe design 
allows tests to be run in response to requests by the QA function through 
production control \dien testing is necessary to evaluate the performance 
of a data processing function. 

The results of the QA process are available for use by production 
control in scheduling eind controlling the processing of data. 

3 .4 Data Types 

Data subject to QA in a scientific sensor processing system can be 
categorized as sensor data, test data, calibration data, support data and 
management data. The latter, management data, is concerned with ordering 
data, requesting special processing, shipping, controlling inventory, etc. 

The QA of management data will be left to a MIS or review by appropriate 
personnel. Each of the four technical data types is discussed below. 


3-13 



3.4.1 Sensor Data 


Sensor data is, by its nature, random, but important statistics of 
the data must be known for the processing system to be designed and built. 
Such statistics include maximum values, max variations, precision, variance, 
and number of smaples per operating sequence, lliese statistics can be confu- 
ted for input data and, in some designs are necessary to know to process 
the data. For instance, mean and maximum values and the variance can be 
used to destripe image data. O^e same measurements should be maintained 
to calculate calibration system performamce. Some scaling of these para- 
meters may be invariant through processing, and calculation of these para- 
meters would provide a QA check on the output product. An example would 
be the normalized power spectrum of each input sequence in an image proces- 
sing system. Ihe transfer function of the processing system should be known 
(some MTF compensation might be implemented) and the spectrum of the output 
should bear a known relationship with the input. 

The variance of the data normalized to the dynamic range could be 
another effective monitoring measure. It is strongly recommended that trends 
of the statistics be maintained to monitor the sensor input and the product 
output. Sampling the statistics would keep the volume of trend data manage- 
able. 


It IS necessary to know the quality of the input data in order to deter- 
mine the achievable quality of the output product. Without this, it will 
not be known whether the processing system degraded the data, or if special 
processing or reprocessing cotild improve the data. 


3-14 



Since the input data abears random, the quality of the input data cannot 
be directly measiured (except for obvious measures such as data counts). 

The statistics may yield estimation of the quality of the input data but 
other sources such as calibration data, bit sync performance and error detec- 
tion and correction codes cem yield good measures. 

Imbedding reference data in sensor data should be considered in a system 
design. Calibration data, synchronizing sequences, controlled data during 
retrace or border scanning in a scanning system or the like can be used 
for quality checks on the path up to the input of the processing system 
in addition to the purposes for vdiich such data is generated. 

Continuing the same concept, it can be worthwhile to process such data 
as border data and calibration data as equivalent test sequences through 
the processing system. It is important, of course, that such data not inter- 
fere with sensor data processing by skewing the statistics and iiqpacting 
control parameters. 

3.4.2 Test Data 


Test data is constructed to see if specifications of a system are being 
met. Test data usually stresses a system, and usually inclvides data vAiich 
are out of the expected bounds for a system. Results of processing test 
data, in an entire test sequence, should be unambiguous. Specific failures 
should point to specific processes. Subsets of test data should be used 
periodically to check system performance. Trends of the results should 


3-15 



be monitored so system performance can be predicted, and system performance 
can be calibrated after a repair or upgrade. 

Lacking other measures, test sequences would have to be run on a fre- 
quent schedule to provide QA measures. iSiis presents an overhead to the 
systan throughput which must be considered in system design. 

Another design consideration is the necessity of providing infection 
points for test data throughout the system. It is generally unsatisfactory 
to have to rely on a single end-to-end test sequence to isolate a quality 
problem . 

3.4.3 Calibration Data 


In this context, calibration data will encompass all fixed and known 
format data. Ihis includes the synchronizing sequences, null words and 
similar data discussed in 3.4.1. Within some limits, this data is known. 

For example, calibration data may consist of 100 "black" samples, a ramp 
of fixed rate to "white", and "white" samples to produce a total of 1000 
samples. Ihe QA process should calculate and record for trend analysis 
the level and noise on the black and white levels, the slope (length) of 
the ramp, the variation of the reunp from nominal (e.g., straight line or 
logarithmic), and data counts. It is not difficult to compute adaptive 
quality thresholds for some statistics of the calibration data a given con- 
fidence level. Ihen, as long as the statistics remain within these threshold 
bounds (and the correct statistics are used), the input calibration data 
can be considered of good quality. As the example presented in Figure 3.4-1, 


3-16 



the x% confidence thresholds can be tighter than the specified thresholds. 
Whether they are or not, if the statistics cross the threshold, a quality 
warning should be generated, o^is should be done even if the data is still 
within specifications, since it warns of a trend vdtich should be monitored 
and explained. 

This same general technique should be applied to all statistics for 
vAiich trends are maintained. 

3.4.4 Support Data 

Support data includes telemetry and operator-set parameters. Processing 
parameters and annotation data %ri.ll likely be derived from support data, 
so the quality of the support data is important. Another example of support 
data is ground control points used to generate coordinates for matching 
points in imagery. Poor quality ground control points will lead to poor 
quality geometric corrections. 

Each kind of support data must be examined to determine the properties 
that reflect quality. With this, "reasonableness" checks, trend analyses 
and statistical measures may apply. The spacecraft operational control 
functions should not be overlooked as sources of quality measures of teleme- 
try data as they are responsible for knowing the quality of telemetry data 
for purposes of spacecraft control. It may be cost effective to have that 
operation control function monitor sensor as well as spacecraft telemetry. 


3-17 



The sensor data processing function uses spacecraft data such as atti- 
tude and attitude rates to annotate cind sometimes process sensor data. 
Knowledge of "pure spacecraft" data (such as thermal and electrical status) 
should be available for correlation vdth data quality. Presumably, necessary 
telemetry points will be requested by those responsible for monitoring sensor 
status, but the potential use of these and other telemetry points in QA 
analyses should be recognized when a limited choice of telemetry points 
are chosen for the system. 

Misalignment of support data with sensor data is obvious enough not 
to be overlooked in system design. Unique tagging both sensor and support 
data is usually simple. In cases where the alignment must be calibrated, 
it should be monitored for correctness. 


3-18 




SPECIFIED QUALITY LEVEL 



QUALITY 
WARNING AND 
THRESHOLD RESET 


I 

Figure 3,4-1, Example of Adaptive Threshold Monitoring 


3-19 



4.0 AUTOMATED QUALITY ASSURANCE 


The first step toward automated quality assurance Is acslsted (or 
semi -automated) quality assessment. The components of either system are: 

o Data access 
0 MIS 

o Automated measure of known functions 
o Ability to adapt (or be adapted) to new functions 

4.1 Data Access 


It is obvious that without data, there can be no quality assessment. 

The kinds of data are discussed in Section 3.4 

Quality errors or faults can be considered as random errors, total 
failures or "out of calibration" performance. Failures are generally easy 
to detect with a few samples from a data stream (either the data stops, 
the data Is a constant, or the data Is noise with no relation to the correct 
statistics). Random failures are difficult to detect. As mentioned above, 
catastrophic random errors need to be detected, and reprocessing will usually 
solve the Issue, but Infrequent data value errors should be tolerated to 
a limited extent. One reason to tolerate Inconsequential errors Is the 
extreme difficulty in detecting them. Errors causing catastrophles are 
similar in their randomness, but are detectable (by definition — a catsstrophy 
must be easily observable). Specific systems and specific definitions 



of catastrophles will determine the level of data processing needed for 
detection of such errors. 

The remaining class of errors are "out-of-spec" or "out-of-calibration" 
errors. These, which can be avoided by adjustment or maintenance, are 
the standard fare for a QA process, and cost trade-offs should be done 
primarily with the control of this type error in mind. To control this 
error, trend analyses and prediction of error conditions should be designed. 

To be cost-effective, sampling is recommended. This is discussed 
in Section 5.L. Enough data must be sampled to calculate reasonably confident 
projections and from enough paths in the system so that fault isolation 
or identification can be done. 

4.2 MIS for Reporting and Tracking 

A typical ground image processing system has three types of information 
components within it. 

o image data 

t 

o support data 

o production control messages from management 

Efficient flow of management information (e.g., work orders, work 
schedules, error reports, log entries, report listing, etc.) is critical 
since it has an obvious impact on the processing, movement, tracking, and 
quality of the image data as well as support data. Clearly, the efficient 


4-2 



operation of the production processor depends on the efficient management 
ccxnmunicatlons and implementation of a central QA function in the data 
flow process. Additionally, such a QA function must have access to data 
to derive, extract, or con^te measures of quality which, in case of bad 
output products, will be provided to the production management for it to 
take appropriate corrective action (see Figure 4.2.1). 

Based upon state-of-the-art, it is difficult to design a general adaptive 
QA process. Consequently, only some QA functions can be fully automated 
while others may have to only semi-automated or largely manual. Furthermore, 
it IS imperative that a sufficiently long history (or knowledge) be maintained 
and made available to the QA function for the purpose of ' training' those 
QA functions fdiich are to be automated. 

Sensing of parameters to be used by the QA functions can be easily 
automated. Such parameters may include, among various others, checks on 
data bounds, formats, data counts, trends, statistics such as mean, variance, 
max, min, etc., calibration parameters , geometric correction parameters 
(resampling parameters). These and other useful parameters should be properly 
tagged and stored on-line for quick access on an as needed basis. It is 
noted that these parameters are sensed at various stages during production 
processing. Furthermore, it is important that these parameters be sensed 
in manner without interferring with the normal data flow or data processing 
in the production processor. 

The task of automating the QA algorithms themselves is a complex one 
since such algorithms will have to be adaptive ^ust like human QA analysts. 


4-3 



MIS 



4-4 


FIGURE 4.2.1 A MIS FOR GENERIC IMAGE PROCESSING SYSTEM 



simple QA algorithms such as checking data bounds, data counts, calibration, 

BER measurements, etc. can be automated. Furthermore, these QA algorithms 

can be implemented in near real-time without interrupting the normal production 

processing. 

However, more con^lex QA algorithms (like the ones to be discussed 
in Section 4.3.2) are based upon artificial intelligence (AI) or pattern 
recognition (PR) techniques. Most of those Al/PR algorithms can be automated. 
Ihe performance (success or failure) of these will depend largely upon: 

- availability of a large sample for each QA parameter 

- selection criteria used in extracting each QA parameter 

Extreme care would be necessary in implementing such QA functions 
to avoid interferring with or impacting the performance of the production 
process itself. Additionally, some of these QA functions may be highly 
process intensive thus making implementation in near real-time unlikely, 
consequently, some of these QA functions may have to be performed off-line. 

Another important function of an auotmated QA processor is to generate 
error reports (including failure and warning reports), tracking progress, 
and closing out of outstanding error reports. From past ej^rience with 
systems such as Landsat image Processing Facility (ZPF) it is clear that 
error reports represent a large portion of the management message volume. 

Error reports are usually generated by operators on various subsystems 
and then sent to respective subsystem production control groups within 
the IPF. It is a very people- intensive effort from generation to correction/ 


4-5 



resolution. As many as 1000 error reports/month are generated during some 
months and as many as 700 may be unresolved at any given time. Ihe effect 
of this on system throughput is significiant since the travel of error 
reports through the system is less than efficient. 

In brief, considering the points/problems mentioned above in manual 
handling of error reports, it is absolutely essential that future image 
processing systems use on automated on-line management information syston 
for generation, collection, evaluation euid resolution of error reports. 
Streamlining of the error reports related functions will undoubtedly result 
in less processing delays thus yielding a much improved total system output. 
System-aided resolution tracking would also result in better control of 
outstanding error reports and, in addition, it may allow efficient planning 
around erroneous data by the production controller during scheduling. 

Itiis would reduce the error-scatter effect. Resolution tracking would 
also permit rapid information on previous dispositions for identical or 
similar error reports. 

A management information system (MIS) consisting of a local management 
network (such as 'Ethernet'), a job trac)cing and error report handling 
computer system, and applications software will prove to be essential for 
future ground image processing systems. It is possible to incorporate 
other functions such as sensing of QA parameters and application of QA 
algorithms in the MIS mentioned above and shown in Figure 4.2.1. In the 
final analysis, however, the design constraints and cost-effectiveness 
should determine the optimal design of MIS. 


4-6 



4.3 Pertinent Functions of AQA 


4.3.1 Known Functions 


These functions are known; specifications exist for which tests and 
evaluation algorithms can be designed. Some examples of Automated Qh on 
known functions are given in the following paragraphs. 

4.3.1 .1 Calibration 


Calibration data is used to transform sensor data to some fixed reference. 
To Use calibration data, some processing is done to calculate reference 
parameters (black level, gain factors, linearity, etc.) to high accuracy 
and precision, even in the presence of many data errors, it is likely 
that calibration data is smoothed (or filtered or averaged) in this process. 
With very little extra effort, the calibration processing function can 
produce measures of the smoothing (variance, extremes, residuals, etc.) 
for use by the QA process. Thus, QA can have measures of the quality of 
the calibration data, and trend analysis will determine if a particular 
set of cal data is ancxnolous or not, and if slow degradation of the cal 
process is occurring QA is extended to quality assurance when processing 
parameters or decisions to reprocess with different or differently estimated 
cal data are determined based on the QA of the cal data. Errors in calibration 
data can be due to the calibration source or the data path. 

If large varying corrections to a specific data channel are being 
performed, the quality of the data in that channel should be suspect. 


4-7 



Later processing may be improved by ignoring or deweighting contributions 
from this channel to processing parameter confutations. 


4. 3. 1.2 BER 


Bit error rates provide a measure of system performance. It can be 
measured if loiown data is used as a reference. In an operational situation, 
it may not be possible to presume that data such as sync words or calibration 
levels are, in fact, known. A composite error can be determined and, by 
observing the random properties, this error may be allocated between the 
data source and components in the transmission path. 

Test data and error detection and correction codes are more reliable 
and acc\irate sources for BER measurement. 

4. 3. 1.3 Des triping 

Des triping is a specific example of the potential use of quality assess- 
ment. 


Images may be produced from data collected from a number of detectors 
designed to scan adjacent strips in scmie field of view. That is, parallel 
scan lines in the reconstructed image come from different (adjacent) detectors. 
Differences in responsivity of the detectors will cause the reconstructed 
image to appear striped, so the differences in responsivity are removed 
by calibration (having the detectors view a known or common soTirce) . If 


4-8 



the responsivity model is not accurate enough or if the calibration scheme 
does not work as aniticipated, detectable stripes will remain in the output 
image. Destriping is a method of removing relative differences in adjacent , 
lines by asstiming that statistics of the data (mean and variance } in adjacent 
channels should be identical. 

Quality assessment should measure striping if there is a specification 
regarding line-to>-line veuriation. The existence of stripes can be estimated 
by computing the same statistics used to correct striping, or by some other 
method such as a power spectrum analysis. (Conqputing the power spectrum 
of an image in the direction orthogonal to the scem direction, say by a 
fast Fourier transform, would show if a spike of energy existed at the 
scan frequency, •ftiis should be attributed to striping if the data used 
has remained in the digital domain. Data scaimed from an image could show 
the scan frequency due to printing spot size errors or line spacing variations.) 
If the power spectrum measure is used for no data is available for 
correction striping but an independent measvure is known. If the same statistics 
are used to measure the quality of destriping as were used to destripe, 
then the QA process measures the implementation of the destriping process 
and must rely on the analytical correctness of the statistical destriping 
process to actually remove stripes to specification levels. 

4.3.2 Adaptive Functions 

A pattern recognition system is generally composed of the following 
elements: 


4-9 



- Input pattern 


- Environment 

- Feature extraction 

- Decision (classification) algorithm 

- Adaptive or Learning mechanism 

The word "adaptive" in pattern recognition is generally used to mean 
the strategy of feature extraction and classification algorithms vdiich 
can be changed flexibly according to the state of the input patterns and 
its environment, with the additional function of learning. 

The simplest approach for pattern recognition is probably the method 
of "template- matching" where a set of templates or prototypes, usually 
one for each pattern class, is stored in the machine. The input pattern 
(with unknown classification) is con^ared with the template of each pattern 
class and the classification is based on a pre-selected matching criterion 
or similarity function. In other words, the input pattern is assigned 
to the pattern class whose template it matches the best. 

Ihe main disadvantage of the template-matching approach is that it 
is somtimes difficult to select a good template for each class and also to 
define a proper matching criterion. OSiis difficulty is especially remarkable 
when large variations and distortions are ejected in all patterns belonging 
to one class. 

Consequently, a more suitsdsle approach is to classify based upon selected 
measurements extracted from the input pattern (see Figure 4.3.1). Biese 


4-10 




4-11 




selected measiirements are called features and are supposed to be invariant 
or less sensitive with respect to commonly encountered variations and dis- 
tortions. Under this approach, pattern recognition can be considered as 
consisting of 3 subproblems. 

1 . What to measTire? That is, what primitive measurements should 
be represented in the input pattern? 

2. What measurements (features) to extract from the input pattern 
and how?, i.e.. Feature Extraction. 

3. What pattern classification technique to use to make a class assignment 
to the input patterns based upon selected features?, i.e.. Pattern 
classification algorithm. 

Before operating a pattern classifier, one must first decide which 
measurements to use as the input pattern. Unfortunately, there is very 
little theory to guide in selection of measurements. At worst this selection 
process may be guided solely by the designer's intuitive ideas about vdiich 
measuranents play an important role in the classification at hand. At best 
the process can make use of known information about some measurements that 
are certain to be important. 

For QA in image processing systems, such measurements may include, 
among various others, BER measurements, signal to noise ratio, statistical 
parameters such as mean, variance, etc. We will henceforth assume that 
a sufficient (large) number of measurements yielding the pattern to be 


4-12 



classified have been selected wisely remembering that the pattern classifier 
cannot itself compensate for a careless selection of measurements. Usually 
the decision of what to measure is rather subjective and higly depedent 
on practical considerations such as the availability of measurements, cost 
of measurements, etc. For instcmce, a certain measurement for QA may be 
known to contain extremely useful and important information. Yet, the 
cost of making that measurement may be prohibitive, thus meiking it impractical. 

Feature Extraction 


As mentioned earlier, each of these measurements may carry a small 
amount of information about the sample or pattern to be classified. Ihis 
high dimensionality makes many pattern recognition problems difficult. 
Obviously, as the number of measurements (or inputs) for the classifiers 
increases, the design of the classifier becomes more difficult. In order 
to simplify the problem, we should find some way to extract/select important 
features from the measured patterns. This problem is called 'featiire extrac- 
tion' and is a key problem in pattern recognition. 

Feature selection is generally a process of mapping the original measure- 
ments into more effective features. If the mapping is linear, the mapping 
function is well defined and our task thus reduces to finding the coefficients 
of the linear functions so as to maximize or minimize a criterion fimction. 
Consequently, if we have the proper criterion for evaluating the effectiveness 
of features we can use well-developed techniques of linear algebra or apply 
optimizing techniques to determine these mapping coefficients, e.g.. Principle 
Component Analysis. 


4-1 3 



Unfortunately, in many applications, there are important feat\ares 
which are not linear functions of original measurements. So, the basic 
problem translates into finding a proper non-linear mapping function for 
the given data. Since we don't have any general theory to generate such 
mapping funcions systematically and to find the optimum one, the selection 
of effective features becomes very much problem oriented. 

4. 3.2.1 Pattern Classification Techniques 

%e concept of pattern classification may be expressed in terms of 
a mapping from feature space to the decision space. Figure 4.3.2 shows 
a generic articifical intelligence (AI) oriented system approach to quality 
assessment (QA) in an operational image processing system. It is presumed 
that the production processor is being monitored at N different stages 
and that at each of these stages one is able to make sufficiently large 
number of measurements which are reasonably useful for QA. Thus, Kj measure- 
ments made at stage I can be represented by a vector P(I) representing the 
input pattern for stage I. The feature extractor for stage I would then 
yield a feature vector, p(I), containing kj (<<Kj) important features. 

The problem of pattern classification can now be restated as follows: 
"Formulate a classification algorithm to assign each possible feature vector 
p(I) to proper pattern class, i.e., class 1 or class 2, \diere class 1 contains 
all input patterns of acceptable quality and class 2 contains input patterns 
that are not of acceptable quality. Mathematically, this problem can be 
formulated in terms of "discriminant functions”, Di[p(I)], i=1 ,2 . 


4-1 4 




V 

ae 9 




a. 

< 

M 

< 

* 

CN 

<• 


u 

M 

b 


4-15 


PROACH TO QA 





The decision rule is given by; 


if 

D-| [p(D] > D 2 [p(I)] p(I) belongs to class 1 (i.e., p(l) 

is of acceptable quality) 

Di [p(I)] < D 2 [p(D] p(I) belongs to class 2 (i.e., p(I) 

is not of acceptable quality) 

And, the decision boudnary (i.e., boundary of partition between class 
1 and class 2 in the feature space) is expressed by the following equation, 

Di tp(I)] * D2 [p(D] 

or, 

Di [p(D] - D2 [p(D] = 0 

A general block diagram for such a classifier is shown in Figure 4.3.3 
idiile Figure 4.3.4 depicts a 2 -dimensional illustration of the decision 
boundary, A wide variety of discriminant functions (e.g., linear, piecewise 
linear, minnimuffl-di stance, quadratic, polynominal, etc.) are described 
in literature. For sake of simlicity, however, only linear discriminant 
functions will be discussed here. It should be pointed out that classifiers 
that use linear aiscriminant functions are called "linear classifiers". 

A linear discriminant function is a linear ccxnbination of feature 
measurements, i.e., 


4-16 



c 

o 

CO 


u 

(U 

Q 



C CD 

n M 
c o 
■H to 
e to 

•H 0) 

u u 
u o 

CO )-i 
■H BU 

a 


cn 

n 

'3- 

V 

M 

3 

00 

•H 


4-17 


A PATTERN CLASSIFIER 






4-18 


A DECISION BOUNDARY IN A 2-DIMENSIONAL FEATURE SPACE 


Di [p(D] = Wi(I) • p(l) + Wi(0) , i=1, 2 

= Wi(I) • p(l) 


where. 


Wi(I) is the weight vector and p(I) is the augmented feature vector. 


Let, 


D(p(I)] = Ditp(I)] - D2CP(I)] = W(I) • p(I) 


Hie decision rule is given by; 


if 

D[p(l)] > 0, then p(I) is acceptable 


and if 

D[p(l)] < 0, then p(I) is not acceptable. 

Hiis decision rule, if necessary can be easily extended to multi-class 
situations so that feature vector p(I) whould be assigned to the pattern 
class i with largest value of Di[p(I)]. 


4-19 



4. 3. 2. 2 Learning In Linear Classifiers 


The implementation of the linear classification technique described 
above requires that proper values of the "weights" be available. However, 
in practice, the correct values of the weights are not known and, therefore, 
the classifier should be designed to have the capability of estimating 
the best values of weights from feature vectors. By observing the feature 
vectors with known classifications, the classifier should be able to automa- 
tically adjust the weights in order to acheive correct recognitions. And, 
the performance of the classifier should gradually improve as more and 
more patterns are observed. This process is called "training" or "learning" 
while the patterns used as inputs are called "training patterns" 

For the sake of simplicity, it can be assumed that the augmented training 
patterns or feature vectors belonging to the two pattern classes are linearly 
separable (can be separated by a hyperplane in the feature space). This 
means that a weight vector W (I) exists such that 

W(I) . p(I) >0 for each training pattern p(I) in class 1 
or, W(I) . p(I) <0 for each training pattern p(I) in class 2 

The "error-correction" training procedure can be summmarlzed as follows; 

For any training pattern in class 1, the above product (i.e. W(I) . p(I) 
must be positive. If the output of the classifier is erroneous (i.e., 
product <0) or undefined (i.e., product =0), the weight vector should be 
adjusted to yield a new weight vector W(I) = W(I) + a.p(I), where a>0 is 
called the correction increment. 

4-20 



On the other hand, for any training pattern in class 2, this product 
must be negative. Else, the weight vector should be adjusted to give 
W'(I) = W(I) - a.p(I). 

Prior to training, the weight vector can be initialized to any convenient 
value. Some rules to make a proper selection of the correction increment 
(a) are given below. 

(1) Fixed increment rule : a is any fixed positive number. 

(ii) Absolute correction rule; a is chosen to be the smallest integer 

such that the product W(I) . p(I) >0. 

(iii) Fractional correction rule; 

a =. b* W(I).p(I) 

P(I).P(I) 

Each of these correction rules is known to converge to yield a solution 
for weight vector in a finite number of training iterations. 

4. 3. 2. 3 Statistical Decision Techniques 

In the preceding sections it was assumed that the feature measurements, 
p(I), are deterministic quantltite. However, in many applications such 
as image processing, this is not always true since noise effects in making 
these measurements cannot be neglected. This is because the input patterns 
in one class may have large variations. 

One approach is to consider the feature vector, p(I), multi-variate 
random variable having known probability density function and known probability 


4-21 



of occurrences of each pattern class. Based upon this a priori Information, 
the function of a pattern classifier is to perform the classification task 
for minimizing probability of mlsrecognitlon. The optimal decision rule 
which minimizes the average loss* is called the "Bayes Decision Rule" and 
a classifier that implements this rule is called a "Bayes Classifier." 

Perhaps an example would help one understand the above and also reduce 
the mathematical complexities associated in formulating such decision rules. 
Assume that parameters such as "gain" (G) and "bias" (B) are found to be 
important in performing QA on radiometric correction process and a sufficiently 
large sample of. these features is available for training. 

Further, let 

fl = probability of occurrence of Class 1 

fz ~ probability of occurrence of Class 2 

Fj = Probability density function for all samples [gain, bias] belonging 
to Class 1 

F 2 = Probability density function for all samples [gain, bias] belonging 
to Class 2 


*Loss incurred by the classifier when it mlsrecognlzed. For the (0,1) 
loss function, the average loss is essentially same as the probability 
of mlsrecognitlon. 


4-22 



The Bayes Decision Rule will render the following as decision boundary 
between Classes 1 and 2. 

fl . F-| - £2 . F2 = 0 

If both parameters, i.e., gain and bias, have gaussian density function 
within each pattern (tdiich is realistic to assume), then the above decision 
boundary is a hyperquadric of the form a.G^ + b.B? + c.G.B+d = 0. The 
coefficients of this equation are functions of mean and varicuice of gain 
and bias in each pattern class. 

The decision rule for an incoming test pattern having gain G and bias 
B will be as follows; 

if G and B are such that the pattern falls above the decision boundary, 
then it belongs to Class 1 . Otherwise, the pattern belongs to Class 2, 

Special Case ; when covariance matrices of both pattern classes are 
equal and it is an unit matrix (or can be transformed into a unit matrix 
by performing a whitening transformation), the Bayes Classifier discussed 
cdsove takes a much simpler form and becomes a distance classifier and the 
decision boundary is the perpendicular bisector of the line joining the 
mean values of gain and bias for the respective classes (see Figure 4.3.5). 

It is believed that a classification technique similar to the one 
above mi^t also prove to be useful for performing QA on parameters like 
bit-error-rate (BER) an signal- to=noise ration (SNR). 

4.3 .2 .4 Sequential Decision Techniques 

In the statistical classification system described in section 4.3 .2 .3 
all the kj features are observed by the classifier at one stage. Additionally, 
the cost of making featmre measurements was not taken into consideration. 
Usually an insufficient number of feature measurements would not result 
in satisfactory levels of correct classification. On the other hand, an 
arbitrarily large number of feature measurements is impractical. The problem 


4-23 




4-24 


FIGURE 4.3.5 A DISTANCE CLASSIFIER 



is especially pertinent when the cost of making a feature measurement is 
high. For example, if the measurement requires that the production process 
be interrupted or completely stopped, or if elaborate equipment, excessive 
times, or complicated operations are required to perform the measurement, 
then these factors may limit or even prohibit the use of such a feature. 

In such instances, sequential decision techniques provide a necessary balance 
between usefulness of a feature measurement and the cost of making that 
measurement. A trade-off between the error (misrecognition) and the number 
of features to be measured cein be obtained by making feature measurement 
sequentially and terminating the sequential process (i.e., making a decision) 
when a sufficient/desirable accuracy of classification has been achieved) . 

Since the feature meas\irements are to be made sequentially, the order 
of features to be measured becomes important. The feature ordering scheme 
should be such that the measurements taken in that order will cause the 
terminal decision earlier. As a result, the problem of featujre ordering 
is very important in sequential recognition systems. 


Wald's sequential probability ratio test (SFRT) is one of the best 
sequential procedures known. At the i^ stage of the sequential process, 
i.e., after the i^^ feature measurement is tedcen, the classifier computes 
the sequential probedDility ratio, R(i) 


R(i) 


= P1 ti) 

F2(i) 


where p.| and E2 are the probability density functions as defined in Section 
4.3.2 .3. This value of R is then conpared with two stopping boundaries 
— S-| and S 2 . The decision rule then becomes. 


If R Si 
R £ S2 


then pattern p(I) belongs to Class 1, and if 
then pattern p(I) belongs to Class 2. 


4-2 5 



On the other hand, if S 2 <R <S-| , then an additional feature measurement 
should be taken and the decision process proceeds to stage i+ 1 . Cie stopping 
boundaries are related to the error (misrecognition) probabilities in the 
following manner; 


and 


Si = 


^ 

®12 


S2 = 



ei2 


where e^^j is the probeibility of deciding that p(I) belongs to Class i when 
actually p(i) truly belongs to Class 3 [i, j = 1, 2]. It has been shown 
that Wald's SPRT is optimal, that is, for given values of e -|2 and 021 there 
is not other procedure with at least as low error probabilities or expected 
risk and with shorter length of average number of feature measurements. 


It should be noted that the Wald's SFRT results in two decision boundaries 
which partition the featiire space into three regions: 


1 . The region associated with Class 1 

2 . Ihe region associated with Class 2 

3. TSie region of indifference (null region) 


Ihe region between the two boundaries is the region of indifference in 
which no terminal decision is made. It is obvious, but important to note, 
that the decision boundaries in a sequential process vary with the number 
of feature measurements. For this reason, it is highly likely that such 
a process will be extremely useful while performing QA of the geometric 


4-2 6 



correction process based upon, for example, resampling parameters. For 
example, if xi,X 2 ,X 3 ... are independent measurements during resampling 
process, then assuming gaussion density functions (having means m-| and 

m 2 and variance v for two classes), then sequential probcUsility ratio R(i) 
can be confuted numerically .* 


After the first parameter x-| is measured, R(1 ) is given by 


R(1 ) = 


(m^ -n^) x^ - 1/2 (m^^-m^^ ) 


and, the decision boundaries are given by 


if x^ _> 


“1 - n »2 


Log Si +1/2 (mi+m 2 ), then pattern belongs to class 1 


if x-| ^ Log $2 + 1/2 (mi+m 2 ), then pattern belongs to Class 2 


V V 

and if, ^ Log $2 + 1/2 (m-)+m 2 ) < x-| < Log S-| + 1/2(m-)+m2) 


m-) — 


®1 - ®2 


then next resampling parameter (X 2 ) is observed and the sequential decision 
process proceeds to Stage 2 . 


After measuring X 2 , one can compute R(2) as follows, 

(m^-m^) 

R(2 ) [X-1+X2 - (mi+n^)]. 

Proceeding as before, the decision boundaries are given by: 

*For simplicity of con 5 )utation, instead of R(i), Log (R(i)) is computed. 


4-2 7 



J ^ V 

ir x-]+X2 2 . Log Si + (mi+mo), then pattern belongs to Class 1 

-m2 

and if, 

V Log S2+(mi+m2) < x-|+X2 < v Log Si + (mi+n^) 

mi” m2 (mi— m2) 


then next resampling parameter X3 will be observed and the decision process 
will proceed to stage 3 . Diis process may continue for several more stages. 
In general, the sequential classification procedure becomes such that 


if ^ ^i ^ V Log + n^ (m^ + m^) then pattern belongs to Class 1 

, m, —m. 2 

1=1 1 2 

n 

if ^ ^ V Log + n (m + m ) then pattern belongs to Class 2 

2 


i=l 

and if 


V"2 


n 




Log + _ (m^ + m^) <C. 

2 m — m 

i=l 1 2 


Log j (m^ + m^) , then 


the process continues to next stage (i + 1) 


The width of the region of indifference is proportional to ) ' 

and, hence for given or assigned values of error probabilities ei2 and 
62 1 , the average number of feature measurements for termination of this 
sequential process depends directly on variance v and inversely on (mi-m2) 

It has been proven, in literature, that the Wald's SPRT 


1 . terminates with probability = 1 


4-2 8 



2 


minimizes the average number of observations to achieve a given 


set of error probability values 

3. is optimal 

It should be noted that there exists a trade-off between the number 
of feature measurements that can be tolerated and selection of values for 
probabilities ei 2 and e 2 i . 

4.3.2 .5 Learning in Sequential Pattern Recognition Systems 

In the previous section, all the information relevant to the statistical 
characteristics of patterns in each class is assumed to be completely known. 
However, in practical situations, this information is only partially known. 

One approach is to design a pattern recognition system which has the capability 
of learning the unknown information during its operation. Bie decisions 
(feature selections and classifications) are then made on the basis of 
learned information. If the learned information gradually approaches the 
true information, then the decisions based upon the learned information 
will eventually approach the optimal decisions as if all the information 
required were known, therefore, during the system's operation, the performance 
and the knowledge of the system are gradually improved. Ihe process vdiich 
aoqiures necessary information for decision during system operation and 
tdiich improves system performance is usually called "learning" or "adapting." 

During the operation of a pattern recognition system, the system learns 
(estimates) the necessary information about each pattern class by actually 
observing various patterns. In other words, the unknown information is 


4-2 9 




obtained from these observed patterns. Depending upon whether the correct 
classifications of the input patterns observed are ]cnown or not, the learning 
process performed by the system can be classified into "learning with a 
teacher" or "supervised learning," and "learning without a teacher" or 
"nonsuperviced learning." In the case of supervised learning, Bayesian 
estimation and stochastic approximation can be used to successively estimate 
(learn) unknown parameters in a given form of feature distributions of 
each class. The successive estimation of continuous conditional probabilities 
of each pattern class can be performed by applying the potential function 
method or the stochastic approximation. The similarities between certain 
Bayesian estimation schemes and the generalized stochastic approximation 
algorithm have been demonstrated. It has also been shown that certain 
learning algorithms of the potential function method belong to the class 
of stochastic approximation algorithms. In nonsupervised learning (or 
clustering) , the correct classifications of the observed patterns are not 
available and the problem of learning is often reduced to a process of 
sviccessive estimation of some unknown parameters in either a mixtiire distri- 
bution of all possible pattern classes or of a loiown decision boundary. 

One property of SPRT which can be used to improve the accuracy of 
classification is to reduce the error (misrecognition) probability by varying 
stopping boundaries. It has been shown that in SPRT if the upper stopping 
boundary Si is increased and the lower stopping boundary S2 is decreased, 
then at least one of the error probabilities, ei 2 and e2 1 , decreases. 


4-30 



4. 3.2. 6 Summary 


It is necessary to emphasize again that the selection of feature is 
an important problem in pattern recognition and it is closely related to 
the performance of classification. Furthermore, in sequential pattern 
recognition systems, the ordering of features for successive measurements 
is very important. The purpose of feature ordering is to provide, at successive 
stages of sequential classification process, a feature vdiich is most "informa- 
tive" eunong all possible choices of features for the next measurement so 
that the decision process can be terminated as early as possible. 


4-31 



5.0 USE OF QUALITY ASSESSMENT IN QUALITY ASSURANCE 

5.1 Sampling as a Tool 


In a well-designed system, It should be unnecessary and Is probably 
overly expensive to test every piece of data at every stage In the process. 
Moreover, It Is statistically certain that a number of errors will be In 
the data. QA must be concerned with the errors which are catastrophic to 
the data (such as loss of sync) and somewhat tolerant of simple data value 
errors (such as radiance errors). It should be clear that catastrophic 
errors are easier to detect and. If caused by some random phenomenon, can 
be eliminated by reprocessing. (It Is not obvious how a radiance error 
could be detected after, say, a resampling process. Majority voting on 
three runs Is an expensive possibility.) 

In a system where a certain small mnaber of detectable errors may be 
permitted but many errors cannot be permitted, sampling provides a means 
to estimate the number of errors without exhaustive testing. This may be 
the case wherein proper operation of a system produces only a few statisti- 
cally generated errors (as from BER), but system failure produces many er- 
rors. As will be shown, sampling does not aid the case where a single error 
(or two or three) errors are Intolerable. 

Sampling can provide estimates of errors (bad data) In a population 
where trend analysis of threshold monitoring is being perfoimied. In a popula- 
tion of N Items with n errors, the probability that a sample of size k will 

contain x errors is given by the hypergeometric distribution 

(n\ fu-n) ^N-k) 





Given that a sample of size k taken and that it does contain x er- 
rors, the maximum likelihood estimator of n, the number of errors in the 
total population, is 

^ .... . X (N-1 ) 

n = greatest integer not exeeding r — 


This, simply, says the proportion of errors in the population is most likely 
the same as the proportion of errors in the sample. Ihe variance in n is 


_ (Ntl (N-k) n ( 1 -n) ^ (N-k) n ( 1 -n) 
k (N-1 ) N N “ k N 


from vdiich the "goodness" of the estimate can be known in a non-rigourous 
fashion (n is not known ) . 


Consider the following argument for confidence estimation. A sample 
k contains x errors, it is most likely that the popiUation N contains n 
= INT[x(N-1 )/k] errors. The probability that the population contains more 
than some limit n' given that a scunple of k contained x errors can be esti- 


mated as follows. 
Prob(acutal n>n' ) 


Ways that x errors in k can come from all n > n' 


ways that x errors in k can come from any n 


5-2 



N-(k-x) 


2 


i «= n*+l 


a 

0 


N- (k-x) 

(x) (klx) 


i = X 


N- (k-x) 

y 

© ©i) 

i = n'+l 



N- (k-x) 


X 0 itt) 


1 = X 


1 + 


y © iti) 

i = X 

N- (k-x) 


XI © ©i) 

i=n'+l 


The only unspecified parameter in the above (gxven the results of a sampling) 
is n', so the probability — the confidence in this case — can be computed 
as a function of an upper limit on the number of errors in the population. 


5-3 



In the special case where no errors are observed in the sample, the 


equation reduces to 


P (n ■> o) 
r 



For reasonable values of k, this is approximately 1 - k/N as intuition tells 
us. 


Table 1 gives the probability of obtaining x errors in a sample of 
size k for a population of 1000 for various numbers of errors in the popula- 
tion, The column for one error in the popxilation (n=1 ) is intuitive. If 
there is one error, the probability of observing it equals the fraction 
of the population sampled. 

Table 2 gives the probabilities of n errors existing in the popxilation 
when 0, 1 , or 2 errors are observed in scunples. Ilie population is 1000 
and sample sizes of 200, 500, and 900 are shown. Suppose 1/2% errors (5 
in 1000) could be tolerated. Then if zero errors were observed in a sample 
of size 500, the probability is 0.9847 that the number of errors in the 
population is 5 or fewer (by summing from n=0 to n=5) . If one error was 
observed, the confidence would be 0.8917, and if two errors were observed, 
the confidence would drop to 0.6577. 


5-4 



. ERRORS IN 
POPULATION 


Q. Ul _ 

^ rvi O 

^ M O 

to «/> •— 


OJ <0 VO 

r** 00 r*^ 

O O + 
^ CM O 

• • • 

O O O 


VO 00 CM 
e») f»» 

1^ VO to 

CO CO o 


O CM CTi 

^ to CO 

<Vi f«» to + 

*— CM ^ 


VO CM « 

O F— to 
CO VO »— + 
to CM O 

• • • 

o o o 


C^ CM CO 
lO CO 

to at ot 
CO CO o 


CM to CM 
00 to ^ ^ 
ot ^ CO o 
O CM CM O 


CO VO ^ CO 
V ^ CO ^ 
<— CO •— <— 
O O CO o 


00 at ot 
at VO o 

00 CM CO + 
to CO o 


to at CM 
to o to + 
to ^ o 

• • « 

o o o 


00 VO to 
00 — ^ 
CM <— CM 
r«* m »— 


f— to at 
at o to 
CM f— to 4 - 


CO ^ ^ 

• • • 

O CM CO o 

« * • • 

o o o 

oooo 

9 to o to 

O to 00 «■ 

CM o O CO 

«• CO to 00 

at to o 

00 to f"' «a- 

CM ^ CM O 

«a- CO o 

• • « • 

• • • • 

oooo 

oooo 

to to ^ 00 

00 00 ^ to 

CO 00 to to 

to to to CO 

^ CO r». CM 

CM at r»» CM 

CM O CO O 

O O CO o 

• • • • 

• « • • 


ot Ot i~ c». 

o CO CM 
o to ^ r«. 
O O CO o 


CO lO O CM 

o CM to at 

O O CM CM 

• » • • 

o o o o 


at to 00 
r>v ^ oo 

+ O CM CM 


o o o 


Ot 00 Ot 

at at 9 t at 
O CO ^ o 
00 to CM O 


CM CO to CM 


00 CM O 00 

»— c*> to ^ 


at 00 c^ w 
at at ot at 
O CO o 
O O CM 00 

• • • • 

o o o o 


o o o o 


at 00 to 


f— CM to at 


oooo oooo 


oooo oooo 


5-5 


PROBABILITY OF X ERRORS IN SAMPLE GIVEN N ERRORS IN POPULATION 
























ac < 

o _i 


o — 


esi 


CO 






CO 


a\ 


as CL. 

UJ o 

a. 


oc 

u 

on 

u 


IJU 

o 

>-l 

H 


eg 

< 

ca 

§ 

04 


CM 

u 

hJ 

eg 

< 

H 


5-6 


Population, N, = 1000 
Sample Size = 200, 500, 900 




5.1.1 Adaptive Sampling 


In an automated QA system operating at, say, 90% confidence, a sample 
in the above example shoving one error would be below the confidence thres- 
hold. However, 89% confidence is still good. Resampling (or sampling the 
next trial of 1000) at a hi^er sampling rate is recommended to tighten 
the variance on the estimate. Suppose that 90% sampling were performed. 
Then, two observed errors would yield 99.87% confidence that five or fewer 
errors were in the population. If more errors were observed (say 4 or 5), 
then 100% sampling and a quality warning would be indicated. It there is 
high confidence that quality is being maintained, the sampling rate would 
drop back to smaller levels. 

Such a sampling scheme is recommended whenever a small number of errors 
can be tolerated and measuuring errors is time consuming or expensive. 

5 .2 Other Parameters 


5 .2 .1 Costs 


The breakpoint for a QA system is where costs due to having products 
(final and intermediate) passed on v^ich do not meet specifications balcuice 
the cost of the QA system. System engineering would determine balancing 
QA costs and processing system improvements, considering throughput loading 
due to the need to reprocess data. Costs due to unsatisfied errors are 
difficult to know, but resxilt from having no product, a product not meeting 


5-7 



specification, and late products. This trade-off is presented in Figure 


5-1 . 


Costs can be reduced if the QA system gives not only indication of quality 
failures, but provides a measure of urgency of required remedial action. 

A total failure requires immediate action. A quality failure due, probably, 
to random failure (a statistical excess of errors in one product) requires 
no action but reprocessing (and a logged report of reprocessing — an indica- 
tion of system status itself) . A trend analysis crossing a warning threshold 
requires, perhaps, preventive maintenance or a test sequence to be scheduled 
at the end of an operational shift. 

5.2.2 Other Measures 


There are benefits accruing to a QA system which reduce costs to other 
system components or provide capabilities beyond QA. 

A proper QA system with its MIS contains data for use in analyzing 
and recommending changes in preventive maintenance scheduling, spares inven- 
tory policies, operator training and, ultimately, processing syst^ design 
changes. 

In reviewing the costs for an automated QA system, these benefits should 
receive accounting. 


5-8 



INPUT 


PROBABILITY OUTPUT IS 
ACCEPTABLE VS. COST 



0 


^2 

53 

ACCEPTABLE 

PRODUCT LATE 

POOR QUALITY 

NO PRODUCT 

PRODUCT 

DUE TO 

PRODUCT 



REPROCESSING 




FIGURE 5-1. Cost Trade-Off Model 


5-9 




6.0 SURVEY OF STATE-OF-ART 


6.1 Literature Survey 

A fairly comprehensive search of existing literature was conducted 
in an effort to identify available techniques and existing systems using 
such techniques for purposes of performing QA (automated and/or semi- 
automated) of image data during ground processing. The search included, 
among others, numerous on-line queries on National Technical Information 
Services 's (NTIS) databases, many trips to local scientific/technical librar- 
ies, and a thorough screening of various IEEE publications during past 4 
or 5 years. Unfortunately, the results have not been encouraging. In fact, 
no technique(s) or system(s) could be identified to assist us in simplifying 
the QA problem at hand. 

A brief description of each of the relevant articles uncovered during 
literature search is presented below. 

Antikldls [1] has attempted to show how Important the needs for image 
quality are in the definition of an image-taking satellite system and the 
associated on-board and ground processing facilities. Some measures of 
image quality have been defined in the framework of future European Space 
Agency (ESA) sensing system. 

Leberel and Kropatsch [2] have conducted experiments with part of a 
digital Landsat-lmage of Southern Germany to show that automatic location 
of features in a digital image is feasible if recognition is supported by 


6-1 




a digital map database. The authors have recognized 13 features in the 
test scene and reported that resulting image rectification left residual 
poi^t errors of less than 4- 1 pixel. 

Tsuchiya and Aral [3] have suggested an approach to geometric correction 
processing. Removal of geometric errors in Landsat MSS imageries in preci- 
sion processing is made using GCP's (Ground Control Point). Thus selection 
of GCP's affects the geometric accuracy of the processed imageries. Based 
on 2 years Landsat MSS imageries data, effects of the feature of GCP matching 
success rate and cross correlation of the two imageries vAiich should be 
registered were studied together with the relationship between time lapse 
of two imageries and success rate of GCP matching, it was found that the 
best GCP's in the automatic matching are island, wharf and break waters, 
and the best GCP's in the manual matching are break water, highway intersec- 
tion and whcurf. Furthermore, it was also found that break water and wharf 
indicate high cross correlation coefficient in the automatic GCP matching. 
There was a periodical tendency in the success rate of GCP matching with 
the prevailing period of 21 months. Between two imageries of time lapse 
ranging from 8 to 1 7 months, a symmetric tendency was found in GCP matching 
success rate with the maximum of 12 months. 

Williams, Siebert, and Gunn [4] have described an image analysis system 
known as KARS. Ihe Kansas Applied Remote Sensing (KARS) program and Depart- 
ment of Geography-Meteorology have developed an interactive digital image 
processing progreun package that runs on the University of Kcuisas central 
computer. The module form and simple Fortran programming of the package 
has allowed easy and rapid upgrades and extensions of its capabilities. 


6-2 



The package is comprised of subimage extraction and rectification, image 


display and enhancement, and both supervised and unsupervised classification 
routines , it has been used in both instructional and research settings 
at the University. 

A classification of multi-sensor imagery from the sensor's point of 
view is advanced by Casasent eind Munoz [5] . From this treatment, the statis- 
tical and deterministic contributions to a multi-sensor image correlation 
process are more clearly seen. The optimum preprocessing operation for 
several cases of multi-sensor image pattern recognition are noted and the 
use of weighted matched spatial filter synthesis as a one step optical pat- 
tern recognition correlator is described. Theoretical formulation and e3q>eri- 
mental verification of the result that edge enhancement preprocessing is 
not always optimum in a multi-sensor optical image pattern recognition system 
are presented. 

Aggarwal and Panda [6] have described a system developed by Honeywell 
for analyzing the imagery automatically and detecting tactical as well as 
strategic targets in the image. The main features of the image recognition 
system are sequential frame processing, symbolic image segmentation, context- 
dependent syntactic recognition, and recognition of multi-component objects 
and conflict removal. 


6-3 



7.0 CONCLUSIONS 


7 . 1 Recommendations 


The primary recommendation from this study is that Quality Assurance 
be considered in the system design from the earliest point, and that access 
to the data be provided in the design for QA. This is conceptually easier 
to do for modular, serial -processing systems than for highly integrated 
parallel-processing design. In the latter, QA should be addressed on which- 
ever level provides access to data and to whichever level fault isolation 
is desired. This may be a fairly low subfunction level. 

Another Important recommendation is that, whatever the level of automa- 
tion, some supervision of the QA process by an analyst is required. Known 
quality measures can be programmed from the start as a "knowledge based 
system (a structured set of IF-THEN statements), and, with access to the 
data, additional quality measures can be added as they are discovered by 
analysts. It is not cost effective to Insure against every conceivable 
failure; many failures in existing systems were certainly not foreseen and 
would have been assigned an extremely low probability a priori had they 
been considered. 

Should NASA wish to pursue even more automation of QA in future systems, 
an adaptive "learning" process is recommended. Again, with access to the 
data, simple statistics and trend analyses can be calculated inexpensively. 
Analysis of the trends and development of a classification algorithm for 
QA may prove worthwhile. 


7-1 



Specific recominendations are given individually in the following. 

• QA must be a system level function, composed of central QA and local 
QA functions which may be distributed throughout the system. 

• Quality should be measured/monitored at the level of satellite de- 
sign, checkout, in-flight control as well as on-board and ground 
processing. Ihat is, QA must become an important element of end-to- 
end satellite system design since the question of image quality 

is no longer just an instrximental concept. 

Systan Design Impacts 

• The ground system by design must be required to provide access to 
all data by the central QA function or process. Such access may 

be provided via numerous taps into the production processing system. 

• QA must have strong interface with Production Control. 

• Cost-effective stiidies shall account for overall QA process. 

QA Process 


• Central QA should control local QA functions, local QA functions 
determine and select data to be analyzed as data progresses through 
various stages of production process. 


7-2 



• Contains or has access to a MIS to track system history and status 
of repair and maintenance. 

• Has access to quality of input data. 

QA Algorithms 

• Some algorithms may be shared by local functions: 

- Statistics 

- Trend Analyses 

- Sampling Algorithms 

• Known measures be assessed by specific calculations 
("AI" type IF-THEN calctilations) . 

• Adaptive algorithms may be included for unforeseen problems or growth 
in analysis. 

• QA should be structured so new known algorithms can be added easily. 

• The central qa function must provide quality indicating measures 
which may later be appended to all output products before their 
dissemination to the user conmunity. 

• All QA algorithms/parameters must be stored for TSD years (perhaps, 
life of mission) for quick retrieval to aid in future analyses. 


7-3 



• Simple QA functions such as checking data bounds, calibration, etc. 
can and should be made adaptive at a reasonable increase in system 
costs initially. Yet, in the long run, this should result in cost 
savings. Such QA parameters can be conqputed in real-time or near 
real-time. 

• More complex QA functions such as those needed to perform QA of 
the geometric corrections processing may also be made adaptive. 
However, implementation of corresponding QA algorithms will probably 
not be in real-time or near real-time. Additionally, cost of their 
implanentation would, in all likelihood, far exceed the resulting 
benefit. 

• Certain types of QA functions (such as detecting a "zipper") will 
best be perfoinned by a human analyst since no simple/known algorithms 
exist to even detect such deficiencies by means of computations. 


7-4 



BIBLIOGRAPHY 


8.0 

1. Antlkldis, J. P., "Introduction to Image Quality Definition and Require- 
ments for Remote Sensing Satellites." 14th Congress of the Interna- 
tional Society of Photogrammetry . Hamburg, West Germany, July 1980, 
p. 416-425. 

2. Leberl, F. and Rropatsch, W. , "Experiments With Automatic Feature Analy- 
sis Using Maps and Images," 14th Congress Of the International Society 
of Photogrammetry . July 1980, p. 451-457. 

3. Tsuchiya, K. and Aral, B., "Some Effects On the GCP Success Rate," 
7th Canadian Symposium on Remote Sensing, Winnipeg, Canada, Sept. 1981, 
p. 497-502. 

4. Williams, T. H. Lee, Siebert, J., and Gunn, C. , "The KARS Image Analysis 

System: A Low Cost Interactive System For Instruction and Research," 

Machine Processing of Remotely Sensed Data Symposium , West Lafayette, 
Indiana, June 1981, p. 178-180. 

5. Casasent, D. and Munoz, D., "Statistical and Deterministic Aspects 
of Mulit-sensor Optical Image Pattern Recognition," Society of Photo- 
Optical Instrumentation Engineers , Vol. 201, 1979, p. 58-64. 

6. Aggarwal, R. K. and Panda, D. P., "Context Dependent Automatic Image 
Screening System," Society of Photo-Optical Instrumentation Engineers , 
Vol. 205, 2980, 85-89. 


8-1 



